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Introduction 


Firewalls,  cryptography,  intrusion  detection,  and  network  management  are  all  valuable  in  the 
defense  of  computer  controlled  systems.  However  these  technologies  are  directed  at  preventing 
outsiders  from  doing  harm  to  the  protected  assets.  They  do  not  address  the  threat  from  insiders. 
These  already  authenticated  users  can  sidestep  many,  if  not  all,  conventional  security  measures. 
Critical  assets  need  a  further  layer  of  security  to  protect  them  from  hostile  agents  that  have  already 
breached  the  walls.  To  address  this,  we  have  been  working  on  Skeptical  Systems. 

Current  systems  execute  any  commands  issued  by  an  authenticated  user  so  long  as  they  fall  within 
the  privileges  granted.  By  contrast,  a  skeptical  system  entertains  doubts  about  the  tasks  that  it  is 
asked  to  perform.  It  questions  the  authenticity,  integrity,  and  intent  of  the  requester  and  acts  with 
due  consideration  of  its  doubts.  Skeptical  system  features  can  be  embedded  in  a  protected  asset,  or 
wrapped  around  it  as  a  guardian.  Skeptical  Systems  are  not  a  technology  per  se,  but  a  philosophy 
of  interaction  that  grants  considerable  autonomy  to  the  protected  system  under  certain  conditions. 
Some  existing  systems  already  embody  this  stance  in  limited  ways.  We  intend  to  take  the  skeptical 
stance  several  steps  further,  including  the  following  novel  aspects: 


Figure  1.  Skeptical  System 

•  the  continuous  assessment  of  apparent  intent  behind  a  command  stream  through  probabilistic 
task  tracking 

•  the  continuous  assessment  of  the  impact  of  commands  based  on  application  models 

•  the  continuous  assessment  of  the  integrity  and  authenticity  of  the  command  issuer  through  user 
modeling 

•  graded  responses  to  threat  based  on  level  of  skepticism,  proximity  of  threat 

This  paper  describes  some  of  the  technical  components  we  have  developed  for  hostile  intent 
recognition,  and  documents  specific  experiments  to  evaluate  a  demonstration  skeptical  system  in  a 
cyber  security  domain.  For  this  demonstration  we  selected,  as  an  application  for  protection, 
Honeywell  Labs,  in  house,  Contract  Management  System  (CMS)  seen  in  Figure  2.  The  CMS  is  an 
application  used  by  contract  officials  and  contract  managers  within  the  company  for  creating  and 
monitoring  the  execution  of  contracts  and  their  deliverables.  It  is  used  to  track  all  of  the 
information  critical  to  each  contract  including:  deliverables,  schedules,  contact  personnel,  shipping 
addresses,  and  other  information  for  all  contracts. 
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Figure  2.  The  Contract  Management  System  user  interface. 

As  the  central  repository  for  all  contract  information,  if  an  insider  wanted  to  cause  significant 
disruption  to  our  work,  the  CMS  would  be  an  obvious  choice  to  attack.  An  insider  could  gain 
valuable  information  about  Honeywell  Labs  contracts  from  this  source  to  sell  to  outsiders.  Further, 
contracts  could  be  modified  to  change  when,  and  where  deliverables  are  sent.  Deliverable  and  even 
whole  contracts  could  be  deleted  from  the  system.  The  objective  of  our  skeptical  system  is  to 
demonstrate  the  ability  to  recognize  and  prevent  these  sorts  of  actions  on  the  CMS  with  minimal 
impact  on  legitimate  users. 

The  rest  of  this  paper  is  broken  down  into  three  major  sections.  The  first  section  will  discuss  the 
system  architecture  overview  and  discussion  of  the  technologies  behind  our  demonstration  system. 
The  next  section  will  discuss  a  number  of  demonstration  scenarios  and  the  system’s  performance 
on  them.  The  third  section  will  discuss  scalability  experiments  done  on  the  intent  recognition 
algorithm  (one  of  the  central  components  of  the  system).  The  report  will  close  with  general 
conclusions  about  the  system  and  make  some  suggestions  for  areas  for  future  work. 
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System  Architecture 


In  general  the  architecture  for  our  implemented  skeptical  system  has  five  components.  The 
architecture  is  depicted  in  Figure  3.  The  intent  recognition  component,  user  classification  and 
anomaly  detector,  and  the  authentication  modules  basically  function  as  sensors.  They  each  take  in 
a  record  of  the  users  actions  and  inputs  and  produce,  a  belief  about  the  users  intentions,  a  belief 
about  the  “normality”  of  the  actions  the  user  is  performing,  and  a  belief  about  the  user’s  identity 
respectively. 

The  Threat  Assessment  module  combines  the  inputs  from  these  three  modules  and  produces  an 
evaluation  of  the  threat  represented  by  the  user’s  actions  to  the  system.  On  the  basis  of  this 
assessment  the  Response  Planner  can  then  choose  to  either  execute  the  actions  the  user  has  called 
for  or  call  for  other  actions.  Note  that  in  some  cases  user  actions  can  directly  trigger  actions  by  the 
Response  Planner.  This  is  designed  to  cover  the  case  of  actions  that  are  so  obviously  hostile  to  the 
system’s  integrity  that  they  call  for  immediate  action. 


Skeptical  system 
filtered  user 
commands 
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Figure  3.  Skeptical  System  architecture. 

The  following  subsections  will  provide  more  detail  on  how  each  of  the  modules  of  the  Skeptical 
System  architecture  works. 

Technical  Approach  to  Threat  Assessment 

The  threat  assessment  module  combines  models  of  individual  users  with  online  information  about 
activity  observed  by  the  detection  modules.  The  result  is  represented  within  a  Bayesian  belief 


3 


network,  which  is  dynamically  updated  by  new  observations  (Jensen,  1996).  Some  of  the  nodes  in 
this  network  are  potentially  observable — when  observations  are  obtained,  particular  variables  are 
clamped  to  observed  values.  Other  nodes  are  never  directly  observable,  so  belief  in  the  possible 
states  is  computed  as  a  function  of  the  observed  nodes  and  prior  beliefs. 

Although  treated  as  a  single  entity,  the  threat  assessment  belief  network  consists  of  two  sections: 
one  for  user  models,  and  one  for  session  models.  A  session  corresponds  to  a  single  login  session 
for  the  CMS  application,  starting  with  an  attempt  to  authenticate  and  ending  with  a  logout  or  other 
program  termination.  The  user  model  section  consists  of  a  set  of  beliefs  about  the  motivations  and 
operating  habits  of  each  user  known  to  the  system;  these  models  persist  across  sessions.  A 
simplified  example  is  shown  in  Figure  4. 
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Figure  4.  Bayesian  belief  network  generated  for  modeling  user  threat. 

The  user  model  section  has  two  roles.  First,  it  assigns  a  belief  to  various  “motives”  for  each  user. 

In  our  simple  example,  the  domains  of  the  motives  are  “loyal”,  “spy”  or  “saboteur”.  More  refined 
characterizations  are  possible,  but  the  intention  is  to  capture  the  concept  of  motivation  at  a  high 
enough  level  to  encompass  a  broad  range  of  plans.  The  true  user  motivations  are  always  unknown, 
but  may  be  inferred.  A  knowledge  base  maintains  prior  information  about  the  users  and  possible 
classes  of  motives.  User  models  are  assigned  to  real  people,  as  opposed  to  login  identities — a  real 
person  may  assume  multiple  identities  when  interacting  with  the  computer.  In  addition  to  the 
known  set  of  registered  users,  there  is  an  “unknown  user”.  The  unknown  user  is  a  real  person  who 
is  not  registered,  perhaps  because  he/she  is  an  interloper,  e.g.  an  outside  hacker  or,  an  insider  that  is 
not  normally  authorized  to  use  the  system.  The  unknown  user  is  represented  as  a  single  individual 
in  our  prototype,  but  there  might  be  zero  or  more  than  one  such  person,  operating  independently  or 
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in  conspiracy;  handling  this  is  an  area  of  interest  for  future  research.  Initially,  prior  belief 
designates  all  uses  as  highly  likely  to  be  loyal,  except  for  the  unknown  user,  whose  motives  are 
automatically  suspicious.  An  additional  part  of  the  knowledge  base  describes  the  relation  between 
motives  and  likely  plans.  For  example,  a  user  with  a  strong  “spy”  motive  is  unlikely  to  engage  in 
plans  whose  goals  are  primarily  destructive  (sabotage).  Insiders  are  however  assumed  to  continue 
to  have  motivation  to  carry  out  normal  activities,  i.e.  to  carry  out  regular  work  assignments.  A 
second  role  of  the  user  model  section  is  to  characterize  user  habits.  This  is  described  in  the  section 
User  Classification  and  Anomaly  Detection. 

Session  models  are  built  dynamically  by  the  threat  assessment  module  as  users  authenticate.  Each 
new  session  generates  a  unique  session  identifier  and  this  event  causes  a  new  subgraph  to  be  added 
to  the  threat  assessment  belief  network.  One  portion  of  this  new  subgraph  relates  to  the  identity  of 
the  user,  and  another  portion  to  the  goals  of  the  session.  The  session  model  subgraph  is  connected 
to  the  user  model,  since  user  motives  and  habits  will  causally  affect  the  activities  of  sessions.  A 
distinguished  node  in  the  session  network  connects  the  true  (unobservable)  root  goal  for  the  session 
to  the  PHATT-reported  goal  for  the  session  (observable).  This  “true  goal”  node  is  the  principal 
product  of  the  threat  assessment  system  passed  to  the  response  planning  system.  It  reflects  the  net 
assessment  of  both  the  user  motives  and  of  the  PHATT  plan  recognition  system. 


Technical  Approach  to  Response  Planning 

The  skeptical  response  planner  is  a  simple,  fast,  and  efficient  mechanism  for  effecting  a  graded 
response  to  insider  threat.  It  combines  inputs  from  PHATT,  a  Bayesian  threat  evaluator,  a  response 
state  model,  and  a  response  action  ontology  to  determine  the  best  response  to  apply  to  a  given 
situation.  The  collection  of  preconditions  and  associated  response  instantiation  definitions  is  called 
the  response  library.  Although  called  a  “planner,”  the  response  planner  does  not  presently  plan 
beyond  the  next  applicable  action(s)  to  a  given  situation  since  its  goal  is  simply  to  preserve  a  safety 
level  for  the  application  within  the  current  application  context. 

The  response  planner  is  an  integral  part  of  a  skeptical  system  instantiation,  so  it  communicates  with 
other  skeptical  system  components  through  shared  memory.  It  communicates  the  appropriate 
response  to  its  host  application,  when  appropriate,  through  an  XML  stream  implemented  on  a 
socket  layer. 

The  skeptical  response  planner  selects  the  best  response  available  from  its  response  library  for  a 
given  situation  and  then  signals  its  host  application  to  apply  the  response.  To  select  the  best 
operator,  the  planner  tests  the  applicability  of  each  response  in  the  response  library.  If  the  response 
is  applicable,  its  cost  is  calculated  -  cost  is  a  unitless  measure  of  relative  utility,  where  lower  cost  is 
better.  If  there  exists  an  applicable,  lowest  cost  action,  it  is  selected  and  then  applied. 

Operator  applicability  is  tested  using  the  operator’s  preconditions.  The  preconditions  are 
established  in  the  operator  definition.  The  operator  preconditions  are  structured  as  if  ...els eif 
structures.  This  means  that  there  may  be  arbitrarily  many  mutually  exclusive  situations  in  which  an 
operator  may  apply;  additionally,  each  situation  may  parameterize  the  resulting  operator 
instantiation  differently. 
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Each  test  contained  within  the  precondition’s  if...elseif  structure  may  be  an  arbitrarily  complex  test 
of  a  given  scenario  as  identifiable  through  queries  to  PHATT,  the  Bayesian  threat  evaluator,  the 
response  state  mode,  and  the  action  ontology.  This  enables  a  graded  response  to  a  large  number  of 
action  sequences,  which  precise  control  over  the  precondition  test  granularity.  The  natural  way  to 
model  the  graded  response  from  one  scenario  to  another  is  as  a  finite  state  machine  (FSM)  where 
each  class  of  scenarios  is  represented  by  part  of  the  FSM,  and  where  each  scenario  in  a  class  is 
represented  as  a  state  in  its  FSM.  There  are  typically  a  number  of  synonymous  motive  state 
changes  that  will  switch  evaluation  from  one  part  of  the  FMS  to  another. 

Figure  5  diagrams  a  possible  transition  in  the  FSM.  For  a  given  set  of  threat  evaluation  motives 
and  plan  intentions,  which  represent  the  states  in  the  FSM,  there  are  transitions  for  new  plan 
intentions  and  action  classes. 


Figure  5.  A  depiction  of  the  skeptical  response  planner  FSM. 

A  key  question  then  is  at  what  granularity  of  change  should  the  response  transitions  be  coded.  We 
found  that  modeling  operator  precondition  change  at  a  high  level  of  granularity  was  sufficient  to 
cover  threatening  scenarios  of  interest  without  being  too  invasive.  Modeling  precondition  test 
change  at  a  high  level  of  granularity  means  that  only  a  relatively  large  change  in  threat  evaluation 
and  PHATT  intention  identification  would  push  the  skeptical  response  planner  into  new  modes  of 
response.  However,  there  were  some  situations  where  there  was  increasing  suspicion  by  the 
skeptical  system  and  then  a  potentially  destructive  action,  where  we  would  want  to  respond  more 
aggressively.  An  action  ontology  provided  the  necessary  instrument  to  provide  that  result,  where 
actions  were  classified  as  read,  change,  or  delete  actions,  and  where  change  and  delete  actions 
could  then  be  questioned  or  aborted  with  increasing  but  still  low  levels  of  user  suspicion  on  the  part 
of  the  skeptical  system. 

The  responses  included  in  the  response  library  included  the  following: 

•  Take  a  picture  of  the  user 

•  Fock  screen  requiring  a  fingerprint 

•  Backup  and  changes  made  to  the  database 

•  Fock  the  account  requiring  a  password 
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•  Require  the  CMS  password 

•  Require  supervisor  authentication 

•  Notify  the  supervisor 

•  Start  a  session  log 

•  Validate  last  command  (Are  you  sure  you  want  to  X?) 

•  Fake  form/report  data 

•  Warn  that  the  last  command  is  not  appropriate 

•  Notify  the  system  administrator 

•  Abort  the  last  action 

•  Terminate  the  user’s  session 


Technical  Approach  to  User  Classification  &  Anomaly  Detection 

By  design,  the  skeptical  system  is  never  completely  certain  of  the  identity  of  the  person  directing  a 
given  session.  A  session  begins  with  a  nominal  user  requesting  to  login  and  presenting  a  password. 
The  actual  user  may  be  any  of  the  registered  users  or  the  special  “unknown”  user.  The  nominal 
user  is  always  known,  but  the  actual  user  can  only  be  inferred  with  greater  or  lesser  confidence. 
Inferring  the  actual  user  from  the  nominal  user  (and  other  evidence)  is  the  job  of  the  user 
classification  and  anomaly  detection  module.  This  function  is  tightly  integrated  into  the  threat 
assessment  Bayesian  belief  network,  a  portion  of  which  is  shown  in  Figure  4. 

The  skeptical  system  prototype  uses  a  modular  approach  to  evidence  sources  for  user  classification. 
This  allows  it  to  be  easily  extended  as  new  methods  of  identification  become  available.  Currently 
experiments  have  used  the  following  forms  of  evidence: 

o  Nominal  user  (who  the  user  claims  to  be) 
o  Access  mode  anomaly  detection 
o  Operating  system  native  authentication 
o  Biometric  devices 

The  access  mode  anomaly  detection  approach  is  based  on  the  workstation  from  which  the  user 
started  the  session,  and  the  time  of  day.  Part  of  the  user  model  for  each  actual  user  is  an  estimate  of 
the  likelihood  that  the  user  will  begin  a  session  from  a  particular  workstation  at  a  particular  time  of 
day  (broken  into  blocks  such  as  morning,  afternoon,  evening,  weekend).  Once  a  session  has 
started,  this  classifier  yields  evidence  that  implicates  one  or  more  actual  users.  A  slow  updating 
process  can  track  changing  user  habits.  The  operating  system  of  the  user  workstation  will  also 
typically  demand  some  form  of  native  authentication.  The  CMS  skeptical  agent  inside  the  client 
application  will  detect  the  owner  of  the  process,  and  pass  this  name  to  the  skeptical  system  as  a 
further  piece  of  evidence.  Finally,  the  skeptical  system  may  ask  for  biometric  authentication  at  any 
time  during  a  session;  the  result  also  adds  evidence  for  an  actual  user.  As  biometric  authentication 
was  not  one  of  the  research  foci,  we  used  commercially  available  hardware  and  software  additions 
in  our  testbed.  Client  systems  were  fitted  with  a  URU  fingerprint  scanner  and  a  LogiTech  camera 
to  take  pictures  of  the  current  user.  Other  forms  of  evidence  such  as  the  keystroke  timings  or  mouse 
motion  patterns  could  readily  be  incorporated. 
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Note  that  the  updating  of  the  belief  network  assigns  a  posterior  distribution  to  all  of  the  nodes. 
Evidence  implicating  a  particular  actual  user  may  influence  the  assessment  of  the  likelihood  of  a 
certain  session  goal.  For  example,  a  plan  to  exfiltrate  data  is  more  plausible  when  an  individual  that 
is  already  suspected  of  spying  motives  is  believed  to  be  the  actual  user.  Conversely,  observation  of 
a  plan  consistent  with  spying  may  make  a  suspected  saboteur  seem  less  likely  as  an  actual  user. 


Technical  Approach  to  Hostile  Intent  Recognition 

Our  plan/intent  recognition  framework  PHATT  (Probabilistic  Hostile  Agent  Task  Tracker)  is  based 
on  the  realization  that  plans  are  executed  dynamically  and  that  at  any  given  moment  the  agent  is 
able  to  execute  any  one  of  the  actions  in  its  plans  that  have  been  enabled  by  its  previous  actions.  To 
formalize  this,  initially  the  executing  agent  has  a  set  of  goals  and  chooses  a  set  of  plans  to  execute 
to  achieve  these  goals.  The  set  of  plans  chosen  detennines  a  set  of  pending  primitive  actions.  The 
agent  executes  one  of  the  pending  actions,  generating  a  new  set  of  pending  actions  from  which  then 
next  action  will  be  chosen. 
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Figure  6.  Probabilistic  Hostile  Agent  Task  Tracker  (PHATT) 


Each  pending  set  is  generated  from  the  previous  set  by  removing  the  action  just  executed  and 
adding  newly  enabled  actions.  Actions  become  enabled  when  their  required  predecessors  are 
completed.  This  process  is  illustrated  in  Figure  6.  To  provide  some  intuition  for  the 
probabilistically-inclined,  the  sequence  of  pending  sets  can  be  seen  as  a  Markov  chain,  and  the 
addition  of  the  action  executions  with  unobserved  actions  makes  it  a  hidden  Markov  model.  There 
are  a  number  of  probabilistic  reasoning  algorithms  that  can  be  used  to  infer  the  agent’s  goals  given 
such  a  Markov  model.  We  have  designed  a  new  particularly  efficient  algorithm  that  makes  use  of 
the  detenninistic  aspect  of  much  of  the  problem  and  only  introduces  probabilities  where  necessary. 
As  a  result,  PHATT  is  able  to  handle  a  number  of  problems  that  are  critical  to  real  domains 
including:  multiple  interleaved  goals,  partially  ordered  plans,  the  effects  of  context  on  the  goals 
adopted,  the  effect  of  negative  evidence  or  failure  to  observe  (“the  dog  didn’t  bark”),  missing 
observations,  and  observation  of  “failed  actions.”  See  (Geib  &  Goldman,  200 1  a, b  and  1999)  for  a 
more  complete  discussion  of  the  system  its  algorithm  and  these  issues.  This  technology  was 
initially  developed  in  the  context  of  the  DARPA  CyberPanel  sponsored  Honeywell  Argus  project. 
We  are  currently  applying  it  to  recognizing  the  activities  of  daily  living  of  patients  in  assisted  living 
situations. 

The  plan  libraries  divide  into  coverage  of  normal  and  malicious  user  behavior.  Our  modeling  of 
nonnal  user  behavior  was  CMS  form  centric,  so  each  CMS  form  has  a  corresponding  set  of 
PHATT  plans  describing  normal  usage.  Since  the  plans  defining  malicious  usage  relied  on  normal 
plan  definitions,  we  modeled  malicious  theories  of  usage  as  separate  domains  -  the  sabotage  and 


spying  domains.  So,  for  instance,  the  Contract  Management  form  had  four  normal  usage  plans  - 
adding  a  new  contract,  changing  an  existing  contract,  changing  contract  particulars,  and  closing  out 
a  contract.  The  sabotage  domain  then  contained  a  plan  describing  a  type  of  sabotage  that  involved 
closing  several  contracts  without  support  for  concluding  that  this  was  normal  behavior. 

PHATT’s  conclusions  about  user  intent  are  communicated  back  to  a  TCL-based  Skeptical  Monitor 
that  shows  PHATT’s  distribution  of  belief  over  the  plans  in  Figure  7. 


Figure  7.  Skeptical  Monitor  showing  PHATT’s  plan  belief  distribution. 
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Basic  System  Demonstration  Experiments 

Once  our  demonstration  system  was  built,  we  experimented  with  six  test  scenarios,  covering  the 
major  classes  of  response  behavior  for  the  skeptical  CMS  application.  The  first  scenario  covered 
innocuous  behavior,  where  the  user’s  login  attributes  and  plans  never  raised  significant  suspicion. 
The  second  scenario  covered  the  case  when  the  user’s  login  was  abnormal,  including  an  abnormal 
login  time  or  abnormal  login  session  ID.  The  third  and  fourth  scenarios  covered  sabotage  behavior, 
with  the  third  scenario  covering  the  less  overt  case  of  sabotage,  and  the  fourth  the  more  overt  case. 
The  fifth  and  sixth  scenarios  cover  spying  behavior,  with  the  fifth  covering  the  less  over  case  of 
spying,  and  the  sixth,  the  more  over  case  of  spying.  In  these  cases  the  distinction  between  spying 
and  sabotage  is  that  spies  are  only  interested  in  acquiring  infonnation  while  the  saboteurs  actively 
modify  or  delete  information  from  the  system. 

In  the  following  sections,  graphs  of  the  systems  performance  on  each  of  these  test  scenarios  will  be 
presented.  Along  with  the  graphs  will  be  a  brief  discussion  of  the  important  events/points  in  the 
scenarios.  Each  set  of  graphs  will  best  be  viewed  in  color  and  will  have  the  following  structure. 

Graph  Descriptions  All  of  the  panels  are  aligned  along  the  sequence  of  observations. 

1.  Time:  the  top  panel  shows  the  runtime  of  the  Skeptical  System,  partitioned  by  component,  for 
each  observation,  measured  in  milliseconds. 

2.  Response:  Shows  the  system  responses  at  the  stage  activated. 

3.  Normal,  malicious:  Display  the  system’s  belief  that  various  normal  or  malicious  plans  are  being 
followed  by  the  agent.  This  is  the  output  of  the  PHATT  component. 

4.  jphelps:  Displays  the  skeptical  system’s  belief  in  the  high  level  motives  of  the  nominal  user, 
“jphelps”  (loyal,  spying,  sabotage).  Other  users  are  tracked,  but  not  shown. 

5.  user:  Displays  the  system’s  belief  in  the  true  identity  of  the  user.  A  user  “unknown”  is  present 
to  accumulate  belief  in  “someone  outside  our  knowledge”. 
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Figure  8.  Innocuous 


These  graphs  depict  innocuous  use.  As  the  user  executes  individual  actions,  the  system  generates 
observations,  which  are  recorded  in  a  log  and  displayed  in  the  chart.  To  conserve  chart  space,  the 
observations’  names  have  been  replaced  by  numbers.  These  observation  numbers  are  used  as  the 
horizontal  axis  of  the  bottom  panel.  The  plan  that  these  actions  reveal  is: 

•  Observations  1-6:  normal  login  procedure  for  the  contract  management  system. 

•  Observation  7-18:  accessing  the  contracts  subform. 
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•  Observations  19:  accessing  reports  (this  is  a  somewhat  strange  sequence  of  actions  and  quite 
unlikely  to  occur  in  this  user's  normal  job  function  -  as  codified  in  the  user’s  default  motives  and 
behaviors). 

•  Observations  20-27:  accessing  more  reports  and  checking  deliverables,  but  not  staying  long  in 
any  one  place;  after  the  user’s  identity  has  been  revalidated,  this  barely  malicious  behavior  does  not 
raise  the  system’s  concern. 
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Figure  9.  Abnormal  NetID 
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This  scenario  depicts  a  user  accessing  the  system  with  an  abnormal  network  ID.  This  abnormal 
network  ID  reported  to  the  application  causes  immediate  high  suspicion  of  the  validity  of  the  user, 
causing  the  system  to  ask  for  more  authentication  information.  The  plan  that  the  user’s  actions 
reveal  is: 

•  Observations  1-6:  normal  login  procedure  for  the  contract  management  system;  since  the  user  is 
under  suspicion,  the  system  immediately  starts  replicating  any  changes  made  to  the  system,  in  case 
some  of  them  are  destructive.  The  system  also  asks  for  an  additional  system  password  to  confirm 
the  user’s  identity. 

•  Observation  7-9:  accessing  the  contracts  subform. 

•  Observations  10-18:  accessing  aspects  of  the  contracts  subfonn  necessary  to  change  contracts, 
and  then  proceedings  to  change  a  few  contracts.  As  soon  as  the  already  suspect  user  starts  to 
appear  as  if  he  is  about  to  change  the  contracts,  the  system’s  wariness  increases,  partly  because  of 
the  two  other  explanations  available  to  it:  that  the  user  may  be  spying  on  contract  particulars  or 
sabotaging  contracts.  The  increased  wariness  causes  the  system  to  save  the  actions  of  the  user  in  a 
log  for  forensics  and  rollback  (together  with  replication).  After  the  system  starts  logging,  the  user 
continues  to  pursue  a  course  of  action  that  it  believes  is  potentially  malicious,  especially  now  that 
the  user’s  actions  are  not  in  the  scope  of  what  would  normally  be  expected  from  the  expected  user, 
which  drives  down  the  system’s  belief  that  the  user  is  who  the  current  authentication  information 
claims.  This  causes  the  system  to  take  more  immediate  precautions  against  loss.  The  supervisor  is 
notified  of  the  user’s  actions,  and  the  user  is  also  finally  warned  that  their  actions  are  not 
appropriate. 

•  Observations  19-30:  accessing  reports  in  detail  causes  the  system  to  become  very  suspicious  of 
the  user  and  the  system  admin  is  now  notified  of  potential  abuse.  Since  the  user  continues  with  a 
strange  mode  of  operation,  the  system  finally  requires  supervisor  authentication.  Once  that  is 
given,  the  system’s  belief  in  the  authenticity  of  the  user  goes  up,  but  this  does  not  change  the  belief 
in  the  type  of  plans  the  user  is  pursuing.  Since  the  supervisor  authentication  infonnation  is  given, 
the  user  is  allowed  to  continue  using  the  system.  If  this  information  was  not  given,  the  user  would 
be  effectively  locked  out. 
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Figure  10.  Saboteur  Complete 

This  scenario  depicts  a  user  sabotaging  the  system.  The  plan  that  these  actions  reveal  is: 

•  Observations  1-6:  normal  login  procedure  for  the  contract  management  system. 

•  Observation  7-9:  accessing  the  contracts  subform. 

•  Observations  10-22:  accessing  contracts,  including  contract  details;  the  user  also  starts  closing 
out  contracts.  This  causes  the  contract  information  to  be  lost. 
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•  Observations  23-24:  contract  sabotage  -  belief  in  this  plan  type  is  driven  up  by  repeated 
contract  closing  with  no  other  explanation  for  the  behavior.  All  other  aspects  of  the  user’s  identify 
are  not  in  question,  but  the  pursuit  of  this  plan  causes  the  system  to  become  suspicious  of  the  user. 
The  pursuit  of  a  malicious  plan  causes  the  system  to  believe  the  user’s  actual  identity  is  unknown. 
It  responds  by  logging  all  of  the  user’s  actions  and  replicating  any  changes  made. 
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Figure  11.  Saboteur  Incomplete 
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These  graphics  depict  system  states  when  a  user  comes  one  step  away  from  executing  a  sabotage 
plan.  The  plan  that  these  actions  reveal  is: 


•  Observations  1-6:  normal  login  procedure  for  the  contract  management  system  (the  system  fails 
to  record  a  login  closing  action  since  it  was  probably  malformed,  so  PHATT  keeps  this  plan  open). 

•  Observation  7-36:  accessing  reports,  closing  two  contracts,  adding  a  deliverable,  changing  a 
deliverable’s  ship-to  address.  All  of  which  are  considered  nominal  user  actions  in  moderation. 
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These  graphs  depict  a  user  spying  on  the  system.  A  complete  spying  plan  is  pursued  and  the 
system  responds  by  attempting  to  authenticate  the  user  with  higher  and  higher  confidence.  The 
plan  that  these  actions  reveal  is: 


•  Observations  1-7:  normal  login  procedure  for  the  contract  management  system. 

•  Observation  8:  accessing  the  report  submenu. 

•  Observations  9-19:  accessing  reports  (the  sequence  of  reports  requested  by  this  individual  is 
highly  suspicious,  and  is  recognized  as  a  specific  espionage  plan— quite  unlikely  to  occur  in  this 
user's  normal  job  function). 

•  Observations  20-2 1 :  exiting  the  application. 
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Figure  13,  Spy  Incomplete 
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These  graphs  depict  a  user  that  comes  one  step  away  from  executing  a  sabotage  plan.  The  plans 
that  these  actions  reveal  is: 

•  Observations  1-6:  normal  login  procedure  for  the  contract  management  system. 

•  Observation  7-9:  accessing  the  contracts  manipulation  submenu. 

•  Observations  10-14:  changing  contract  particulars  -  requires  repeated  searching  of  contract 
particulars  -  actions  which  overlap  with  contract  spying. 

•  Observations  15-3 1 :  accessing  reports  -  PHATT  concludes  that,  although  the  spying  plan  is  not 
yet  complete,  the  user  is  in  fact  spying. 

Observations 

System  Runtimes:  Notice  that  the  processing  for  each  of  the  observations  takes  no  more  600ms 
and  in  most  cases  less  than  400ms.  The  high  cost  of  the  response  engine  is  attributable  to  the 
particular  Bayesian  belief  network  engine  we  are  using  for  threat  analysis,  which  is  currently 
performing  some  redundant  calculations.  We  believe  this  can  be  significantly  reduced.  In  any 
case,  the  time  added  to  the  system’s  response  time  by  the  skeptical  system  will  not  generally  be 
perceptible  to  end  users. 

Issues 

MSBNx  Most  seriously,  just  loading  a  relatively  simple  skeptical  threat  network  in  MSBNx,  one 
containing  only  seven  users,  and  then  attempting  to  assess  the  probabilities  on  the  PHATT  node 
consistently  crashes  MSBNx  with  an  “Out  of  Memory”  error  on  a  1.0GHz  PHI  with  256MB  RAM. 

Also  of  concern,  however,  is  the  noticeable  system  slowdown  in  Skeptical  CMS  components  when 
the  number  of  users  in  the  MSBNx  network  exceeds  four.  PHATT  is  hit  hardest  by  MSBNx’s 
resource  consumption.  Its  average  observation  compute  time  increases  from  924.286  msec  to 
4017.143  msec  for  six  to  seven  users  in  the  threat  model. 

This  problem  can  be  mitigated  by  using  a  better  Bayes  Net  evaluation  product  and  by  user 
aggregation.  We  believe  that  instead  of  modeling  each  use,  a  composite  user  could  be  used  to 
model  all  “unexpected”  users  during  a  session. 
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PHATT  Algorithm  Scalability  Experiments 

We  have  also  conducted  some  initial  experiments  designed  to  allow  use  to  understand  the  most 
critical  factors  determining  the  runtime  of  the  PHATT  algorithm.  The  scalability  of  the  PHATT 
algorithm  to  real-world  sized  domains  is  necessary  for  the  effective  application  of  this  technology. 
Therefore  it  is  critical  that  we  have  a  better  understanding  of  the  effects  that  various  facets  of  the 
plan  library  will  have  on  the  algorithm’s  runtime.  Our  initial  hypothesis  was  that  while  intuitively 
the  number  of  roots  in  the  plan-library  might  be  assumed  to  have  a  large  effect  on  the  runtime  of 
the  algorithm  that  in  fact  other  features  of  the  plan  library  would  have  more  impact.  The 
experiments  were  also  designed  to  be  a  starting  point  for  more  sophisticated  analyses  to  follow. 

Experimental  Design 

This  experiment  measures  the  computation  time  for  the  PHATT  algorithm  as  a  function  of  three 
factors  of  interest.  It  was  conducted  entirely  in  situ  on  a  Sun  Sunfire-880  with  8Gb  of  main 
memory  and  4  750-MHz  CPUs,  which  afforded  the  luxury  of  a  complete  factorial  design  with  a 
large  number  of  replications  (1000). 

The  response  measured  was  the  cpu  time  (msec)  required  to  recognize  the  randomly  generated 
sequences  corresponding  to  plans  with  varying  characteristics.  Note  that  cpu  time  was  exclusive  of 
any  time  used  by  the  operating  system  or  by  other  processes  on  the  computer. 

The  factors  and  levels  are  summarized  below: 


Factor 

Description 

Levels 

Order 

The  type  of  order 

constraints  between 

plan 

total,  one,  partial. 

nodes . 

unord 

Depth 

The  level  of  plan 

depth 

3, 

4 

Roots 

The  number  of  root 

goals  in  the  plan 

library 

10, 

100, 

1000 

For  each  experimental  condition,  a  single  plan  library  was  generated.  Each  such  plan  library  had  a 
number  of  features  including  the  three  tested  factors. 

1.  Order  (tested  factor):  This  is  an  indication  of  how  many  and  what  type  of  ordering  constraints 
there  were  between  the  actions  in  the  methods  in  the  plan  library. 

•  Total:  the  actions  are  linearly  ordered  with  each  action  having  a  single  ordering  constraint 
with  the  action  that  precedes  it  in  the  initial  definition. 

•  One:  all  of  the  actions  are  ordered  after  the  first  action  listed  in  the  definition.  However 
they  are  all  unordered  with  respect  to  any  action  other  than  the  first. 

•  Partial:  Each  action  (other  than  the  first  in  the  definition)  may  have  one  ordering  constraint. 
This  constraint  orders  the  action  after  one  of  the  randomly  chosen  earlier  actions  in  the 
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definition.  As  such  this  means  that  methods  can  vary  all  the  way  from  being  totally  ordered 
to  completely  un-ordered.  This  value  for  the  ordering  factor  was  specifically  included  in 
order  to  approximate  real  world  plan  libraries.  In  these  cases  the  actions  will  be  neither 
totally  ordered  or  completely  unordered. 

•  Unord:  All  of  the  actions  are  unordered  with  respect  to  each  other. 

2.  Plan  Library  Depth  (tested  factor):  This  is  a  measure  of  the  depth  of  the  plan  trees.  In  these 
plan  trees  “or  nodes”  (choice  points)  and  “and  nodes”  (method  expansions)  alternate  levels.  In 
all  cases  the  root  is  defined  as  an  “or  node”  and  levels  alternate  as  they  go  down.  For  example 
in  the  case  of  depth  three  trees  the  root  is  an  “or  node”  followed  by  a  layer  of  “and  nodes” 
followed  by  leaf  nodes.  The  depth  four  trees  have  another  layer  of  “or  nodes”  before  the 
leaves. 

3.  Number  of  Roots  (tested  factor):  This  measures  the  number  of  root  nodes  in  the  plan  library  at 
10,  100  and  1000  roots  respectively. 

4.  Method  Branching  factor:  This  determines  the  number  of  actions  at  an  “and  node”  (method 
definition).  In  all  cases  this  was  fixed  at  4. 

5.  Choice  Branching  factor:  This  determines  the  number  of  actions  at  an  “or  node”  (choice  node). 
In  all  cased  this  was  fixed  at  3. 

Note  that  all  the  actions  in  the  plan  libraries  were  unique.  Thus,  once  an  action  is  observed  there  is 
actually  no  ambiguity  about  what  root  intention  the  action  must  contribute  to.  This  does  not  rule 
out  the  possibility  of  more  than  one  instance  of  a  given  plan.  However,  this  does  allows  us  to  make 
several  inferences  about  the  effect  of  various  factors  on  the  algorithm’s  runtime.  We  will  return  to 
discuss  this  later. 

For  each  such  plan  library/experimental  condition,  1000  test  cases  were  generated.  To  generate  a 
test  case  three  unique  roots  were  selected  at  random.  For  each  of  these  roots  a  legal  linearization  of 
a  plan  for  that  root  was  generated,  by  choosing  a  single  sub-action  for  “or  nodes”  and  choosing  a 
legal  linearization  for  ordering  constraints  for  “and  nodes.”  Once  a  plan  was  generated  for  each  of 
the  three  roots  they  were  then  randomly  interleaved  maintaining  the  ordering  constraints  of  the 
individual  plans  but  mixing  the  actions. 

For  each  of  the  1000  trial  points,  the  internal  clock  was  started,  and  PHATT  was  presented  with  the 
generated  action  sequence;  after  processing  the  sequence  PHATT  computed  the  probability 
distribution  over  the  root  goals.  At  this  point,  the  clock  was  halted  and  the  cpu  time  measured. 

This  time  was  recorded  for  the  condition.  The  files  containing  the  cpu  times  (and  other  statistics) 
were  parsed  and  combined  with  custom  Perl  programs  and  imported  into  S-Plus  for  charting  and 
data  analysis. 

Overview  of  the  Data 

The  entire  ensemble  of  data  (24,000  points)  is  charted  below  against  all  three  factors  in  a  trellis 
chart.  Each  narrow  column  of  points  represents  an  experimental  condition.  This  chart  is  best 
viewed  in  COLOR.  On  the  vertical  axis,  we  have  used  a  log  scale  for  cpu  time  for  compression 
reasons,  since  it  spans  5  orders  or  magnitude.  Each  row  of  panels  has  the  same  logio  of  cpu  time, 
with  1  msec  added  to  deal  with  those  data  points  that  registered  as  ‘O’. 

The  horizontal  axis  is  the  number  of  roots  at  equally  spaced  log  units:  10,  100,  1000. 
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Iog10(1  +  cpu) 


The  left  column  of  panels  is  for  depth  3  plans,  and  the  right  column  of  panels  is  for  depth  4  plans. 
The  rows  represent  different  order  constraints.  From  the  top  down:  unordered,  total,  partial,  one. 


log  10(n  roots) 


Figure  14.  Trellis  Chart 


Comments 

Several  things  are  evident  from  this  graph.  The  type  of  order  constraint  has  a  profound  effect,  both 
on  means  and  on  variance.  Unordered  plans  exhibit  the  highest  means;  partially  ordered  plans  also 
have  relatively  high  means,  but  show  the  highest  variance,  regardless  of  the  other  factors.  The 
difference  between  order  “one”  and  order  “total”  are  not  so  obvious.  The  number  of  roots  seems  to 
have  the  expected  effect  (increasing)  on  cpu  time,  but  the  size  of  this  effect  is  less  than  we  might 
have  anticipated  (recall  these  are  log  units).  The  plan  depth  (3  or  4)  likewise  seems  to  have  the 
expected  effect  (increasing)  on  cpu  time. 


A  NOVA  Model 

While  most  of  the  interesting  conclusions  are  evident  in  the  graph  and  not  in  doubt,  an  analysis  of 
variance  was  conducted.  After  examination  of  residuals  from  the  saturated  model,  it  was  clear  that 
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the  assumption  of  normality  and  uniform  variance  is  doubtful.  After  examining  some  transforms, 
we  found  that  this  problem  was  substantially  (though  not  completely)  ameliorated  by  the  log 
transform  of  the  response,  much  as  was  done  in  the  previous  multipanel  chart  (See  Section  4.2). 

Model  -  Anova  Table 


Short  Output: 

Call: 

aov (formula  =  log(l  t  cpu)  ~  order  *  depth  *  roots,  data  =  all) 


Terms : 


order 

depth 

roots 

order : depth 

order 

Sum  of  Squares  219167.2 

7653.9 

35494 . 6 

148 

.3 

Deg.  of  Freedom  3 

1 

2 

3 

order : depth : roots 

Residuals 

Sum  of  Squares 

263.1 

13233 . 1 

Deg.  of  Freedom 

6 

23976 

Residual  standard  error: 

0.7429193 

Estimated  effects  are  balanced 

Df 

Sum  of  Sq 

Mean  Sq 

F  Value 

Pr  (F) 

order  3 

219167.2 

73055.73 

132364 . 4 

0 

depth  1 

7653.9 

7653.95 

13867 . 6 

0 

roots  2 

35494 . 6 

17747.31 

32155 . 1 

0 

order: depth  3 

148.3 

49.44 

89.6 

0 

order: roots  6 

167.1 

27.85 

50.5 

0 

depth: roots  2 

348.0 

174.00 

315.3 

0 

order : depth : roots  6 

263.1 

43.86 

79.5 

0 

Residuals  23976 

13233 . 1 

0.55 

167.1 

6 


depth : roots 
348.0 
2 


We  note  that  all  main  effects  and  interactions  register  as  significant.  The  large  number  of  replicates 
here  means  even  quite  small  effects  will  be  detectable. 

Contrasts  of  the  Order  Factor 

Of  significant  interest  to  us  was  if  there  really  is  a  difference  between  ’total’  and  ’one’  levels  of 
order,  or  for  that  matter,  between  ’partial’  and  ’unord’.  The  Tukey  HSD  method  was  used  to  test 
these  contrasts.  The  results  imply  significance  at  the  0.01  level  for  all  the  pairwise  comparisons, 
with  the  ‘one’  level  being  slightly  more  time  consuming  for  the  algorithm  as  expected: 


#  All  of  the  diffs  that  are  greater  than  lwr  are  significant. 

# 

>  TukeyHSD ( large . logaov,  "order",  ordered  =  TRUE,  conf . level=0 . 01) 
Tukey  multiple  comparisons  of  means 
1%  family-wise  confidence  level 
factor  levels  have  been  ordered 


Fit:  aov (formula  =  log(l  +  cpu)  ~  order  *  depth  *  roots,  data  =  large) 


$order 

one-total 

partial-total 

unord- total 

partial-one 

unord-one 

unord-partial 


diff 
0 . 1748367 
2 . 4839996 
7 . 4876435 
2.3091629 
7 . 3128068 
5.0036438 


lwr 

0 . 1706773 
2 . 4798403 
7 . 4834841 
2.3050036 
7 . 3086474 
4.9994845 


upr 

0 . 1789961 
2 . 4881590 
7 .4918029 
2.3133223 
7.3169662 
5.0078032 


* 


* 
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The  story  is  more  complicated  than  suggested  by  this  test,  since  the  interactions  in  the  model  are 

significant — a  more  complete  analysis  would  look  at  fixed  combinations  of  the  other  factors. 

Analysis,  Conclusions,  and  Future  Questions  for  Experimentation 

Several  things  are  evident  from  this  set  of  experiments. 

1 .  The  feature  of  the  plan  library  that  has  the  most  significant  effect  on  the  algorithm’s  runtime  is 
the  ordering  constraints  within  the  plan  library,  followed  by  the  number  of  roots  in  the  plan 
library,  followed  by  the  actual  depth  of  the  plan  trees. 

2.  It  is  not  simply  the  number  of  ordering  constraints  that  is  important  to  runtime.  Since  ‘total’ 
and  ‘one’  have  the  same  overall  number  of  constraints  but  ‘one’  has  a  higher  average  runtime 
we  can  conclude  that  the  way  in  which  the  constraints  are  organized  has  a  significant  impact  on 
the  algorithms  runtime. 

3.  The  data  suggests  that  the  average  runtime  for  the  algorithm  is  scaling  linearly  in  the  number  of 
roots  in  the  plan  library.  This  is  very  good  news  indeed  for  the  PHATT  algorithm 

4.  The  runtime  data  for  the  ‘partial’  ordering  condition  fell  consistently  between  the  ‘one’  and 
‘unord’  cases.  This  confirms  our  hypothesis  that  plan  libraries  without  ordering  constraints 
represents  an  upper  bound  or  worst  case  for  the  ordering  factor.  This  can  be  used  not  only  in 
future  experiments  but  also  begins  to  set  boundaries  for  our  understanding  of  the  algorithms 
performance. 

There  are  a  number  of  other  experiments  that  should  be  performed  to  provide  still  more 

information  about  the  algorithms  performance. 

1 .  Running  another  ordering  condition  that  represents  the  opposite  of  ‘one’  but  still  maintains  the 
same  number  of  ordering  constraints.  In  this  case  the  last  action  of  the  plan  would  be  ordered 
after  all  of  the  other  actions  in  the  plan.  Intuition  suggests  that  this  value  for  the  ordering 
factor  should  produce  runtimes  between  ‘one’  and  ‘unord’,  this  would  provide  still  more 
evidence  that  the  number  of  ordering  constraints  is  not  the  critical  feature  but  rather  how  they 
are  constraining  the  plans  in  the  library. 

2.  We  need  to  do  more  tests  at  points  between  100  and  1000  plan  root  nodes  and  even  beyond 
1000  in  order  to  confirm  the  hypothesis  that  the  algorithm  is  scaling  linearly  with  the  number  of 
root  intentions  in  the  plan  library 

3.  We  need  to  do  more  test  plan  libraries  with  greater  depth.  Intuition  suggests  that  the  algorithm 
should  scale  exponentially  in  the  depth  of  the  plan  trees.  This  results  from  the  larger  number  of 
plans  as  a  result  of  the  number  of  layers  of  “or  nodes”  goes  up.  However  even  if  this  is  bom 
out  by  our  experiments,  this  should  not  be  seen  as  a  problem  since  the  depth  of  hierarchical 
plans  in  most  real  world  applications  is  limited  to  a  relatively  small  value,  (i.e  less  than  10) 

4.  Further  analysis  is  needed  to  understand  why  the  variance  of  the  algorithm  is  increasing  as  the 
number  of  root’s  increases.  This  is  common  in  many  computer  programs.  Our  current 
hypothesis  is  that  this  is  a  result  of  some  small  linear  searches  embedded  in  the  algorithm  that 
are  linked  to  the  number  of  roots  in  the  plan,  however  this  has  yet  to  be  confirmed. 

Post  run  examination  of  those  cases  that  had  particularly  bad  runtimes  and  introspection  about  the 

algorithm  has  also  lead  to  a  very  important  conclusion  and  suggested  a  set  of  experiments  that 
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should  be  run.  We  have  found  that  those  data  points  with  particularly  long  runtimes  for  the  ‘unord’ 
case  had  a  large  number  of  possible  explanations  for  the  observations.  That  is  since  the  ‘unord’ 
cases  had  no  ordering  constraints  when  it  saw  three  actions  that  could  contribute  to  a  single  plan,  it 
could  not  rule  out  the  possibility  that  each  of  these  actions  contributed  to  a  separate  instance  of  the 
root  goal.  That  is  the  system  was  forced  to  consider  the  possibility  that  multiple  instance  of  the 
same  goal  were  being  performed  because  the  actions  were  unordered. 

The  hypothesis  of  the  critical  nature  of  the  number  of  possible  explanations  is  given  weight  by  our 
results  with  ‘one’  and  ‘total’.  Remember  that  all  of  the  actions  in  the  plan  library  are  unique.  This 
means  that  for  the  ‘one’  and  ‘total’  cases  there  was  in  fact  only  a  single  possible  explanation  for  the 
observed  actions  due  to  the  ordering  constraints.  As  a  result,  the  algorithm  was  never  required  to 
consider  the  possibility  that  the  agent  was  pursuing  multiple  instances  of  the  same  plan.  In  each  of 
these  cases,  each  plan  had  a  unique  first  action.  If  the  system  does  not  see  it  then  it  doesn’t  have  to 
consider  the  possibility  of  multiple  instances  of  the  root  intention. 

This  hypothesis  would  suggest  that  the  number  and  type  (early  in  the  plan  vs.  late  in  the  plan)  of 
ordering  constraints  in  the  plan  library  is  a  method  of  controlling  the  more  important  factor  in  the 
algorithm’s  runtime,  namely  the  number  of  possible  explanations  for  the  observed  actions.  This 
corresponds  to  our  intuitions  that  as  the  system  is  forced  to  consider  a  larger  number  of  possible 
explanations  for  the  observations  that  the  runtime  of  the  algorithm  should  increase.  As  a  result  we 
believe  that  one  of  the  most  important  experiments  that  needs  to  be  run  is  to  control  the  number  of 
possible  explanations  within  a  fixed  size  plan  library  and  see  the  impact  that  it  has. 

Conclusions  and  Future  Directions 

Our  project  has  achieved  all  of  its  major  objectives.  It  has  demonstrated  that  the  skeptical  systems 
approach  to  security  is  viable  for  limited  plan  libraries  covering  applications  like  the  contract 
management  system  we  have  used.  Further  our  experiments  give  us  good  reason  to  believe  that  the 
critical  PHATT  intent  recognition  algorithm  is  scalable  to  whole  real  world  applications. 

However  our  application  of  this  technology  to  this  domain  has  highlighted  some  limitations  in  the 
intent  recognition  algorithm  that  would  be  a  significant  asset  in  future  research  work  and 
application  to  real  world  domains.  For  example,  PHATT: 

1 .  does  not  effectively  handle  plans  with  looping, 

2.  has  only  a  qualitative  temporal  model  and  would  benefit  from  a  quantitative  model, 

3.  could  benefit  from  a  typing  system  for  plan  variables, 

Finally,  more  work  should  be  done  to  understand  those  factors  that  impact  the  size  of  the 
explanation  space  captured  by  a  plan  library  and  the  effect  this  has  on  the  algorithm’s  runtime. 
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